Toward Computational Cumulative Biology by Combining Models of Biological Datasets

نویسندگان

  • Ali Faisal
  • Jaakko Peltonen
  • Elisabeth Georgii
  • Johan Rung
  • Samuel Kaski
چکیده

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Electricity Load Forecasting by Combining Adaptive Neuro-fuzzy Inference System and Seasonal Auto-Regressive Integrated Moving Average

Nowadays, electricity load forecasting, as one of the most important areas, plays a crucial role in the economic process. What separates electricity from other commodities is the impossibility of storing it on a large scale and cost-effective construction of new power generation and distribution plants. Also, the existence of seasonality, nonlinear complexity, and ambiguity pattern in electrici...

متن کامل

Computational modeling of the EGF-receptor system: a paradigm for systems biology.

Computational models have rarely been used as tools by biologists but, when models provide experimentally testable predictions, they can be extremely useful. The epidermal growth factor receptor (EGFR) is probably the best-understood receptor system, and computational models have played a significant part in its elucidation. For many years, models have been used to analyze EGFR dynamics and to ...

متن کامل

Hybrid Dynamic Optimization Methods for Systems Biology with Efficient Sensitivities

In recent years, model optimization in the field of computational biology has become a prominent area for development of pharmaceutical drugs. The increased amount of experimental data leads to the increase in complexity of proposed models. With increased complexity comes a necessity for computational algorithms that are able to handle the large datasets that are used to fit model parameters. I...

متن کامل

Comparison of MLP NN Approach with PCA and ICA for Extraction of Hidden Regulatory Signals in Biological Networks

The biologists now face with the masses of high dimensional datasets generated from various high-throughput technologies, which are outputs of complex inter-connected biological networks at different levels driven by a number of hidden regulatory signals. So far, many computational and statistical methods such as PCA and ICA have been employed for computing low-dimensional or hidden represe...

متن کامل

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014